Information Space Models for Data Integration, and Entity Resolution
نویسندگان
چکیده
Geospatial information systems provide a unique frame of reference to bring together a large and diverse set of data from a variety of sources. However, automating this process remains a challenge since: 1) data (particularly from sensors) is error prone and ambiguous, 2) analysis and visualization tools typically expect clean (or exact) data, and 3) it is difficult to describe how different data types and modalities relate to each other. In this paper we describe a data integration approach that can help address some of these challenges. Specifically we propose a light weight ontology for an Information Space Model (ISM). The ISM is designed to support functionality that lies between data catalogues and domain ontologies. Similar to data catalogues, the ISM provides metadata for data discovery across multiple, heterogeneous (often legacy) data sources e.g. maps servers, satellite images, social networks, geospatial blogs. Similar to domain ontologies, the ISM describes the functional relationship between these systems with respect to entities relevant to an application e.g. venues, actors and activities. We suggest a minimal set of ISM objects, and attributes for describing data sources and sensors relevant to data integration. We present a number of statistical relational learning techniques to represent and leverage the combination of deterministic and probabilistic dependencies found within the ISM. We demonstrate how the ISM provides a flexible language for data integration where unknown or ambiguous relationships can be mitigated.
منابع مشابه
Corpus based coreference resolution for Farsi text
"Coreference resolution" or "finding all expressions that refer to the same entity" in a text, is one of the important requirements in natural language processing. Two words are coreference when both refer to a single entity in the text or the real world. So the main task of coreference resolution systems is to identify terms that refer to a unique entity. A coreference resolution tool could be...
متن کاملThe Effect of Transitive Closure on the Calibration of Logistic Regression for Entity Resolution
This paper describes a series of experiments in using logistic regression machine learning as a method for entity resolution. From these experiments the authors concluded that when a supervised ML algorithm is trained to classify a pair of entity references as linked or not linked pair, the evaluation of the model’s performance should take into account the transitive closure of its pairwise lin...
متن کاملTo Express Required CT-Scan Resolution for Porosity and Saturation Calculations in Terms of Average Grain Sizes
Despite advancements in specifying 3D internal microstructure of reservoir rocks, identifying some sensitive phenomenons are still problematic particularly due to image resolution limitation. Discretization study on such CT-scan data always has encountered with such conflicts that the original data do not fully describe the real porous media. As an alternative attractive approach, one can recon...
متن کاملImprovement of Chemical Named Entity Recognition through Sentence-based Random Under-sampling and Classifier Combination
Chemical Named Entity Recognition (NER) is the basic step for consequent information extraction tasks such as named entity resolution, drug-drug interaction discovery, extraction of the names of the molecules and their properties. Improvement in the performance of such systems may affects the quality of the subsequent tasks. Chemical text from which data for named entity recognition is extracte...
متن کاملA numerical scheme for space-time fractional advection-dispersion equation
In this paper, we develop a numerical resolution of the space-time fractional advection-dispersion equation. We utilize spectral-collocation method combining with a product integration technique in order to discretize the terms involving spatial fractional order derivatives that leads to a simple evaluation of the related terms. By using Bernstein polynomial basis, the problem is transformed in...
متن کامل